In this notebook, a template is provided for you to implement your functionality in stages which is required to successfully complete this project. If additional code is required that cannot be included in the notebook, be sure that the Python code is successfully imported and included in your submission, if necessary. Sections that begin with 'Implementation' in the header indicate where you should begin your implementation for your project. Note that some sections of implementation are optional, and will be marked with 'Optional' in the header.
In addition to implementing code, there will be questions that you must answer which relate to the project and your implementation. Each section where you will answer a question is preceded by a 'Question' header. Carefully read each question and provide thorough answers in the following text boxes that begin with 'Answer:'. Your project submission will be evaluated based on your answers to each of the questions and the implementation you provide.
Note: Code and Markdown cells can be executed using the Shift + Enter keyboard shortcut. In addition, Markdown cells can be edited by typically double-clicking the cell to enter edit mode.
Design and implement a deep learning model that learns to recognize sequences of digits. Train the model using synthetic data generated by concatenating character images from notMNIST or MNIST. To produce a synthetic sequence of digits for testing, you can for example limit yourself to sequences up to five digits, and use five classifiers on top of your deep network. You would have to incorporate an additional ‘blank’ character to account for shorter number sequences.
There are various aspects to consider when thinking about this problem:
Here is an example of a published baseline model on this problem. (video)
Use the code cell (or multiple code cells, if necessary) to implement the first step of your project. Once you have completed your implementation and are satisfied with the results, be sure to thoroughly answer the questions that follow.
from __future__ import print_function
from keras.datasets import mnist
from keras.models import Model
from keras.layers import Input, Dense, TimeDistributed
from keras.layers import LSTM
from keras.utils import np_utils
# Training parameters.
batch_size = 32
nb_classes = 10
nb_epochs = 5
# Embedding dimensions.
row_hidden = 128
col_hidden = 128
# The data, shuffled and split between train and test sets.
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
import numpy as np
import pandas as pd
from time import time
from IPython.display import display
from __future__ import print_function
import matplotlib.pyplot as plt
import numpy as np
import os
import sys
import tarfile
%matplotlib inline
import random
import math
from IPython.display import Image
from scipy import ndimage
from six.moves.urllib.request import urlretrieve
from six.moves import cPickle as pickle
blank=np.zeros((28,28),dtype=np.int)
fig=plt.figure()
plt.imshow(blank)
plt.show()
int1=list()
import random
for i in range(1,60001):
int1.append(random.randint(1,59999))
print(len(int1))
int2=list()
import random
for i in range(1,60001):
int2.append(random.randint(1,59999))
print(len(int1))
int3=list()
import random
for i in range(1,60001):
int3.append(random.randint(1,59999))
print(len(int1))
int3=list()
import random
for i in range(1,60001):
int3.append(random.randint(1,59999))
print(len(int1))
k1=list()
import random
for i in range(1,10001):
k1.append(random.randint(1,60000))
print(len(k1))
k2=list()
import random
for i in range(1,10001):
k2.append(random.randint(1,60000))
print(len(k2))
k3=list()
import random
for i in range(1,10001):
k3.append(random.randint(1,60000))
print(len(k3))
image_width = 28
image_height = 28
#dataset_size=X_train.shape[0]
dataset_size = X_train.shape[0]
def createSequences():
dataset = np.ndarray(shape=(dataset_size, image_height, image_width),dtype=np.float32)
data_labels = []
i = 0
w = 0
while i < (dataset_size):
#for numbers in k:
if i in k1:
temp = np.hstack([blank])
dataset[i, :, :] = temp
temp_str = (str( '10'))
data_labels.append(temp_str)
w +=1
i += 1
np.array(data_labels)
else:
#if i in int1:
temp = np.hstack([X_train[i]])
dataset[i, :, :] = temp
temp_str = (str(y_train[i]))
data_labels.append(temp_str)
w += 1
i += 1
np.array(data_labels)
return dataset, data_labels
dataset1, data_labels_train1 = createSequences()
from keras.utils import np_utils
print(dataset1.shape)
print(len(data_labels_train1))
print(data_labels_train1[1])
image_width = 28
image_height = 28
#dataset_size=X_train.shape[0]
dataset_size = X_train.shape[0]
#dataset_size = 90000
def createSequences():
dataset = np.ndarray(shape=(dataset_size, image_height, image_width),dtype=np.float32)
data_labels = []
i = 0
w = 0
while i < (dataset_size):
for j in int2:
j=random.randint(1,59999)
temp = np.hstack([dataset1[j]])
dataset[i, :, :] = temp
temp_str = (str(data_labels_train1[j]))
data_labels.append(temp_str)
w += 1
i += 1
np.array(data_labels)
return dataset, data_labels
dataset2_temp, data_labels_train2_temp = createSequences()
from keras.utils import np_utils
print(dataset2_temp.shape)
print(len(data_labels_train2_temp))
print(data_labels_train2_temp[1])
X_train1=dataset2_temp
Y_train1=data_labels_train2_temp
print (X_train1.shape)
image_width = 28
image_height = 28
#dataset_size=X_train.shape[0]
dataset_size = X_train.shape[0]
def createSequences():
dataset = np.ndarray(shape=(dataset_size, image_height, image_width),dtype=np.float32)
data_labels = []
i = 0
w = 0
while i < (dataset_size):
#for numbers in k:
if i in k2:
temp = np.hstack([blank])
dataset[i, :, :] = temp
temp_str = (str( '10'))
data_labels.append(temp_str)
w +=1
i += 1
np.array(data_labels)
else:
#if i in int1:
temp = np.hstack([dataset2_temp[i]])
dataset[i, :, :] = temp
temp_str = (str(data_labels_train2_temp[i]))
data_labels.append(temp_str)
w += 1
i += 1
np.array(data_labels)
return dataset, data_labels
dataset2, data_labels_train2 = createSequences()
from keras.utils import np_utils
print(dataset2.shape)
print(len(data_labels_train2))
print(data_labels_train2[1])
image_width = 28
image_height = 28
#dataset_size=X_train.shape[0]
dataset_size = X_train.shape[0]
def createSequences():
dataset = np.ndarray(shape=(dataset_size, image_height, image_width),dtype=np.float32)
data_labels = []
i = 0
w = 0
while i < (dataset_size):
#for j in int3:
j=random.randint(1,59999)
temp = np.hstack([dataset2[j]])
dataset[i, :, :] = temp
temp_str = (str(data_labels_train2[j]))
data_labels.append(temp_str)
w += 1
i += 1
np.array(data_labels)
return dataset, data_labels
dataset3_temp, data_labels_train3_temp = createSequences()
from keras.utils import np_utils
print(dataset3_temp.shape)
print(len(data_labels_train3_temp))
print(data_labels_train3_temp[1])
X_train2=dataset3_temp
Y_train2=data_labels_train3_temp
image_width = 28
image_height = 28
#dataset_size=X_train.shape[0]
dataset_size = X_train.shape[0]
def createSequences():
dataset = np.ndarray(shape=(dataset_size, image_height, image_width),dtype=np.float32)
data_labels = []
i = 0
w = 0
while i < (dataset_size):
#for numbers in k:
if i in k2:
temp = np.hstack([blank])
dataset[i, :, :] = temp
temp_str = (str( '10'))
data_labels.append(temp_str)
w +=1
i += 1
np.array(data_labels)
else:
#if i in int1:
temp = np.hstack([dataset2_temp[i]])
dataset[i, :, :] = temp
temp_str = (str(data_labels_train2_temp[i]))
data_labels.append(temp_str)
w += 1
i += 1
np.array(data_labels)
return dataset, data_labels
dataset3, data_labels_train3 = createSequences()
from keras.utils import np_utils
print(dataset2.shape)
print(len(data_labels_train3))
print(data_labels_train3[1])
image_width = 28
image_height = 28
#dataset_size=X_train.shape[0]
dataset_size = X_train.shape[0]
def createSequences():
dataset = np.ndarray(shape=(dataset_size, image_height, image_width),dtype=np.float32)
data_labels = []
i = 0
w = 0
while i < (dataset_size):
#for j in int3:
j=random.randint(1,59999)
temp = np.hstack([dataset3[j]])
dataset[i, :, :] = temp
temp_str = (str(data_labels_train3[j]))
data_labels.append(temp_str)
w += 1
i += 1
np.array(data_labels)
return dataset, data_labels
dataset4_temp, data_labels_train4_temp = createSequences()
from keras.utils import np_utils
print(dataset4_temp.shape)
print(len(data_labels_train4_temp))
print(data_labels_train4_temp[1])
X_train3=dataset4_temp
Y_train3=data_labels_train4_temp
print(X_train3.shape)
X1=X_train1
X2=X_train2
X3=X_train3
dataset_size = X_train.shape[0]
def createSequences():
dataset = np.ndarray(shape=(dataset_size, image_height, 84),dtype=np.float32)
data_labels = []
i = 0
w = 0
while i < (dataset_size):
#temp = np.hstack([X_train1[i],X_train2[i],X_train3[i]])
temp = np.hstack([X1[i],X2[i],X3[i]])
dataset[i, :, :] = temp
temp_str = (3,int(Y_train1[i]),int(Y_train2[i]),int(Y_train3[i]),10,10)
data_labels.append(temp_str)
w += 1
i += 1
np.array(data_labels)
return dataset,data_labels
complete_dataset,Y_train_new = createSequences()
from keras.utils import np_utils
print(len(Y_train_new))
print(Y_train_new[1])
print(complete_dataset.shape)
r1=list()
for i in range(1,4999):
r1.append(random.randint(1,4999))
print(len(r1))
print (r1[100])
r2=list()
for i in range(1,4999):
r2.append(random.randint(1,4999))
print(len(r2))
print (r2[100])
r3=list()
for i in range(1,4999):
r3.append(random.randint(1,4999))
print(len(r3))
print (r3[100])
m1=list()
for i in range(1,500):
m1.append(random.randint(1,4999))
print(len(m1))
print (m1[100])
m2=list()
for i in range(1,500):
m2.append(random.randint(1,4999))
print(len(m2))
print (m2[100])
m3=list()
for i in range(1,500):
m3.append(random.randint(1,4999))
print(len(m3))
print (m3[100])
image_width = 28
image_height = 28
#dataset_size=X_train.shape[0]
dataset_size =10000
def createSequences():
dataset = np.ndarray(shape=(dataset_size, image_height, image_width),dtype=np.float32)
data_labels = []
i = 0
w = 0
while i < (dataset_size):
#for numbers in k:
if i in m1:
temp = np.hstack([blank])
dataset[i, :, :] = temp
temp_str = (str( '10'))
data_labels.append(temp_str)
w +=1
i += 1
np.array(data_labels)
else:
#if i in int1:
temp = np.hstack([X_test[i]])
dataset[i, :, :] = temp
temp_str = (str(y_test[i]))
data_labels.append(temp_str)
w += 1
i += 1
np.array(data_labels)
return dataset, data_labels
dataset_test1, data_labels_test1 = createSequences()
from keras.utils import np_utils
print(dataset_test1.shape)
print(len(data_labels_test1))
print(data_labels_test1[1])
image_width = 28
image_height = 28
#dataset_size=X_train.shape[0]
dataset_size = 10000
def createSequences():
dataset = np.ndarray(shape=(dataset_size, image_height, image_width),dtype=np.float32)
data_labels_test = []
i = 0
w = 0
while i < (dataset_size):
#for j in int3:
j=random.randint(1,9999)
temp = np.hstack([dataset_test1[j]])
dataset[i, :, :] = temp
temp_str = (str(data_labels_test1[j]))
data_labels_test.append(temp_str)
w += 1
i += 1
np.array(data_labels_test)
return dataset, data_labels_test
X_test1, Y_test1 = createSequences()
from keras.utils import np_utils
print(X_test1.shape)
print(len(Y_test1))
print(Y_test1[1])
image_width = 28
image_height = 28
#dataset_size=X_train.shape[0]
dataset_size =10000
def createSequences():
dataset = np.ndarray(shape=(dataset_size, image_height, image_width),dtype=np.float32)
data_labels = []
i = 0
w = 0
while i < (dataset_size):
#for numbers in k:
if i in m2:
temp = np.hstack([blank])
dataset[i, :, :] = temp
temp_str = (str( '10'))
data_labels.append(temp_str)
w +=1
i += 1
np.array(data_labels)
else:
#if i in int1:
temp = np.hstack([X_test[i]])
dataset[i, :, :] = temp
temp_str = (str(y_test[i]))
data_labels.append(temp_str)
w += 1
i += 1
np.array(data_labels)
return dataset, data_labels
dataset_test2, data_labels_test2 = createSequences()
from keras.utils import np_utils
print(dataset_test2.shape)
print(len(data_labels_test2))
print(data_labels_test2[1])
image_width = 28
image_height = 28
#dataset_size=X_train.shape[0]
dataset_size = 10000
def createSequences():
dataset = np.ndarray(shape=(dataset_size, image_height, image_width),dtype=np.float32)
data_labels_test = []
i = 0
w = 0
while i < (dataset_size):
#for j in int3:
j=random.randint(1,9999)
temp = np.hstack([dataset_test2[j]])
dataset[i, :, :] = temp
temp_str = (str(data_labels_test2[j]))
data_labels_test.append(temp_str)
w += 1
i += 1
np.array(data_labels_test)
return dataset, data_labels_test
X_test2, Y_test2 = createSequences()
from keras.utils import np_utils
print(X_test2.shape)
print(len(Y_test2))
print(Y_test2[1])
image_width = 28
image_height = 28
#dataset_size=X_train.shape[0]
dataset_size =10000
def createSequences():
dataset = np.ndarray(shape=(dataset_size, image_height, image_width),dtype=np.float32)
data_labels = []
i = 0
w = 0
while i < (dataset_size):
#for numbers in k:
if i in m3:
temp = np.hstack([blank])
dataset[i, :, :] = temp
temp_str = (str( '10'))
data_labels.append(temp_str)
w +=1
i += 1
np.array(data_labels)
else:
#if i in int1:
temp = np.hstack([X_test[i]])
dataset[i, :, :] = temp
temp_str = (str(y_test[i]))
data_labels.append(temp_str)
w += 1
i += 1
np.array(data_labels)
return dataset, data_labels
dataset_test3, data_labels_test3 = createSequences()
from keras.utils import np_utils
print(dataset_test3.shape)
print(len(data_labels_test3))
print(data_labels_test3[1])
image_width = 28
image_height = 28
#dataset_size=X_train.shape[0]
dataset_size = 10000
def createSequences():
dataset = np.ndarray(shape=(dataset_size, image_height, image_width),dtype=np.float32)
data_labels_test = []
i = 0
w = 0
while i < (dataset_size):
#for j in int3:
j=random.randint(1,9999)
temp = np.hstack([dataset_test3[j]])
dataset[i, :, :] = temp
temp_str = (str(data_labels_test3[j]))
data_labels_test.append(temp_str)
w += 1
i += 1
np.array(data_labels_test)
return dataset, data_labels_test
X_test3, Y_test3= createSequences()
from keras.utils import np_utils
print(X_test3.shape)
print(len(Y_test3))
print(Y_test3[1])
XX=X_test1
XY=X_test2
XZ=X_test3
print(X_test1.shape)
num_pixels = X_test1.shape[1]*X_test1.shape[2]
X_test1 = X_test1.reshape(X_test1.shape[0], num_pixels).astype('float32')
X_test2 = X_test2.reshape(X_test2.shape[0], num_pixels).astype('float32')
X_test3 = X_test3.reshape(X_test3.shape[0], num_pixels).astype('float32')
X_test1 /= 255
X_test2 /= 255
X_test3 /= 255
#X_test /= 255
print(X_test1.shape)
dataset_size =10000
def createSequences():
dataset = np.ndarray(shape=(dataset_size, image_height, 84),dtype=np.float32)
data_labels = []
i = 0
w = 0
while i < (dataset_size):
#temp = np.hstack([X_train1[i]])
#dataset[i, :, :] = temp
temp = np.hstack([XX[i],XY[i],XZ[i]])
dataset[i, :, :] = temp
temp_str = (3,int(Y_test1[i]),int(Y_test2[i]),int(Y_test3[i]),10,10)
data_labels.append(temp_str)
w += 1
i += 1
np.array(data_labels)
return dataset, data_labels
test_dataset,Y_test_new = createSequences()
from keras.utils import np_utils
print(len(Y_test_new))
print(Y_test_new[1])
X_train = complete_dataset.reshape(complete_dataset.shape[0], 2352).astype('float32')
X_test = test_dataset.reshape(test_dataset.shape[0], 2352).astype('float32')
-
dataset_size =10000
def createSequences():
#dataset = np.ndarray(shape=(dataset_size, image_height, 84),dtype=np.float32)
data_labels1 = []
data_labels2 = []
data_labels3 = []
data_labels4 = []
data_labels5 = []
data_labels6 = []
i = 0
w = 0
while i < (dataset_size):
#temp = np.hstack([X_train1[i]])
#dataset[i, :, :] = temp
#temp = np.hstack([XX[i],XY[i],XZ[i]])
#dataset[i, :, :] = temp
temp_str1 = (Y_test_new[i][0])
temp_str2 = (Y_test_new[i][1])
temp_str3 = (Y_test_new[i][2])
temp_str4 = (Y_test_new[i][3])
temp_str5 = (Y_test_new[i][4])
temp_str6 = (Y_test_new[i][5])
data_labels1.append(temp_str1)
data_labels2.append(temp_str2)
data_labels3.append(temp_str3)
data_labels4.append(temp_str4)
data_labels5.append(temp_str5)
data_labels6.append(temp_str6)
w += 1
i += 1
np.array(data_labels1)
np.array(data_labels2)
np.array(data_labels3)
np.array(data_labels4)
np.array(data_labels5)
np.array(data_labels6)
return data_labels1,data_labels2,data_labels3,data_labels4,data_labels5,data_labels6
y_1,y_2,y_3,y_4,y_5,y_6 = createSequences()
from keras.utils import np_utils
print(len(y_1))
print(y_1[1])
Y1 = np_utils.to_categorical(y1, 11)
Y2 = np_utils.to_categorical(y2, 11)
Y3 = np_utils.to_categorical(y3, 11)
Y4 = np_utils.to_categorical(y4, 11)
Y5 = np_utils.to_categorical(y5, 11)
Y6 = np_utils.to_categorical(y6, 11)
Y_1 = np_utils.to_categorical(y_1, 11)
Y_2 = np_utils.to_categorical(y_2, 11)
Y_3 = np_utils.to_categorical(y_3, 11)
Y_4 = np_utils.to_categorical(y_4, 11)
Y_5 = np_utils.to_categorical(y_5, 11)
Y_6 = np_utils.to_categorical(y_6, 11)
print(len(Y_train_new))
print(X_train.shape)
x=X_train
print(test_dataset.shape)
fig=plt.figure()
plt.imshow(test_dataset[1])
plt.show()
X_train1 = complete_dataset.reshape(complete_dataset.shape[0], 28,84,1).astype('float32')
X_test1 = test_dataset.reshape(test_dataset.shape[0], 28,84,1).astype('float32')
from keras.layers import Input, Dense
from keras.models import Model
from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers.convolutional import Convolution2D
from keras.layers.convolutional import MaxPooling2D
inputs = Input(shape=(28,84,1))
conv = Convolution2D(32, 5, 5, border_mode='same', input_shape=( 28,84,1), activation='relu')
conv1 = Convolution2D(64, 5, 5, border_mode='same', activation='relu')
conv2= Convolution2D(128, 5, 5, border_mode='same', activation='relu')
x=conv(inputs)
x=MaxPooling2D(pool_size=(2, 2))(x)
x=conv1(x)
x=MaxPooling2D(pool_size=(2, 2))(x)
x=conv2(x)
x=MaxPooling2D(pool_size=(2, 2))(x)
x=Flatten()(x)
x=Dropout(0.2)(x)
x = Dense(1064, activation='relu')(x)
x=Dropout(0.2)(x)
x = Dense(800, activation='relu')(x)
x=Dropout(0.2)(x)
x = Dense(600, activation='relu')(x)
x=Dropout(0.2)(x)
x = Dense(400, activation='relu')(x)
x=Dropout(0.2)(x)
x = Dense(200, activation='relu')(x)
x=Dropout(0.2)(x)
x = Dense(164, activation='relu')(x)
x=Dropout(0.2)(x)
x = Dense(32, activation='relu')(x)
predictions1 = Dense(11, activation='softmax')(x)
predictions2 = Dense(11, activation='softmax')(x)
predictions3 = Dense(11, activation='softmax')(x)
predictions4 = Dense(11, activation='softmax')(x)
predictions5 = Dense(11, activation='softmax')(x)
predictions6= Dense(11, activation='softmax')(x)
model = Model(input=inputs, output=[predictions1,predictions2,predictions3,predictions4,predictions5,predictions6])
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(X_train1, [Y1,Y2,Y3,Y4,Y5,Y6])
model.fit(X_train1, [Y1,Y2,Y3,Y4,Y5,Y6])
print('started............')
scores = model.evaluate(X_test1, [Y_1,Y_2,Y_3,Y_4,Y_5,Y_6], verbose=0)
print('Test loss and Test Accuracy', scores)
from keras.models import Sequential from keras.layers import Dense from keras.layers import Dropout from keras.layers import Flatten from keras.layers.convolutional import Convolution2D from keras.layers.convolutional import MaxPooling2D from keras.utils import np_utils
model = baseline_model()
model.fit(X_train, y_train, validation_data=(X_test, y_test), nb_epoch=10, batch_size=200, verbose=2)
What approach did you take in coming up with a solution to this problem?
Handwritten digit recognition is a typical image classification problem. Convolutional neural networks (convnets) have shown excellent results in various image classification tasks. Many researchers in used convnetsto solve image classification problems. Md.Shopon et al in “Bangla handwritten digit recognition using autoencoder and deep convolutional neural network “ used convnets for Bangla handwritten digit recognition. Chuankun Li et al in “Joint Distance Maps Based Action Recognition with Convolutional Neural Network” used convnets to encode the spatio-temporal information of skeleton sequences into color texture images, referred to as Joint Distance Maps (JDMs). Van Tuan Nguyen in “ConvNets and AGMM based real-time human detection under fisheye camera for embedded surveillance” proposed ConvNets based YOLO model for a successful object (including human) detection. Pichao Wang et al in “Action Recognition From Depth Maps Using Deep Convolutional Neural Networks” proposed convnets for human action recognition.Vladimir Risojevic et al in “Analysis of learned features for remote sensing image classification” used convnets for analysis of learned features for remote sensing image classification.
The model is derived using Convolutional Networks(convnets), I used three classifiers with convnets to identify digits upto three.
What does your final architecture look like? (Type of model, layers, sizes, connectivity, etc.)
The model is designed using Keras API
The model consists of 14 layer convnets,
Layer1: convolutional layer with convolution size: 5 5 32
Layer2:Maxpooling layer of pool size 2*2
Layer3:convolutional layer with convolution size: 5 5 64
Layer4:Maxpooling layer of pool size 2*2
Layer5:convolutional layer with convolution size: 5 5 128
Layer6:Maxpooling layer of pool size 2*2
Dropout of 0.2
Layer7:Fully Connected Layer with 1064 outputs
Dropout of 0.2
Layer8:Fully Connected Layer with 800 outputs
Dropout of 0.2
Layer9:Fully Connected Layer withf 164 outputs
Dropout of 0.2
Layer13:Fully Connected Layer with 32 outputs
Dropout of 0.2
Layer14: Output layer with 11 outputs
In our Convolutional Neural Networks (CNN), we have used 3 Convolutional (CNV) layers followed by 3 Max-Pooling (PL) layers which is followed by 5 Fully Connected (FC) layers as given above.
We receive 28x84 inputs from synthetic dataset. We have used 5x5x32 filters at the first CNV layers ,5x5x64 filters in the second CNV layers and 5x5x128 filters in the third CNV layers
In the Pooling Layers, we apply max(0,x) (Rectifier Linear Unit) after each CNV which is also followed by 2x2 maximum subsampling (For every 2x2 window, maximum value within the window is passed to next layer).
At the input of fully connected layer, we have 1064 samples. In first FC layer we reduce the size to 800 samples, in the second 164 , in the third 32 and the last FC reduce the size to 11 samples which is the number of classification categories we have in synthetic datasets. The Layer14 is a simple SoftMax Classifier
How did you train your model? How did you generate your synthetic dataset? Include examples of images from the synthetic data you constructed.
I trained the model using three digit synthetic dataset prepared from mnist training dataset.
GENERATION OF SYNTHETIC DATASET
Once you have settled on a good architecture, you can train your model on real data. In particular, the Street View House Numbers (SVHN) dataset is a good large-scale dataset collected from house numbers in Google Street View. Training on this more challenging dataset, where the digits are not neatly lined-up and have various skews, fonts and colors, likely means you have to do some hyperparameter exploration to perform well.
Implementation
Use the code cell (or multiple code cells, if necessary) to implement the first step of your project. Once you have completed your implementation and are satisfied with the results, be sure to thoroughly answer the questions that follow.
from __future__ import print_function
import matplotlib.pyplot as plt
import numpy as np
import os
import sys
import tarfile
import tensorflow as tf
from IPython.display import Image
from scipy import ndimage
from six.moves.urllib.request import urlretrieve
from six.moves import cPickle as pickle
%matplotlib inline
url = 'http://ufldl.stanford.edu/housenumbers/'
last_percent_reported = None
def download_progress_hook(count, blockSize, totalSize):
"""
A hook to report the progress of a download. This is mostly intended for users with
slow internet connections. Reports every 1% change in download progress.
"""
global last_percent_reported
percent = int(count * blockSize * 100 / totalSize)
if last_percent_reported != percent:
if percent % 5 == 0:
sys.stdout.write("%s%%" % percent)
sys.stdout.flush()
else:
sys.stdout.write(".")
sys.stdout.flush()
last_percent_reported = percent
def maybe_download(filename, force=False):
"""
Download a file if not present, and make sure it's the right size.
"""
if force or not os.path.exists(filename):
print('Attempting to download:', filename)
filename, _ = urlretrieve(url + filename, filename, reporthook=download_progress_hook)
print('\nDownload Complete!')
else:
print(filename, 'is already downloaded. Skipped.')
return filename
train_filename = maybe_download('train.tar.gz')
test_filename = maybe_download('test.tar.gz')
extra_filename = maybe_download('extra.tar.gz')
np.random.seed(133)
def maybe_extract(file_, force=False):
filename = os.path.splitext(os.path.splitext(file_)[0])[0] # remove .tar.gz
if os.path.isdir(filename) and not force:
# You may override by setting force=True.
print('%s is already presented - Skipping extraction of %s.' % (filename, file_))
else:
print('Extracting %s file data. Please wait...' % file_)
tar = tarfile.open(file_)
sys.stdout.flush()
tar.extractall()
tar.close()
print('File %s is successfully extracted into %s directory.' % (file_, filename))
return filename
train_folder = maybe_extract(train_filename)
test_folder = maybe_extract(test_filename)
extra_folder = maybe_extract(extra_filename)
def remove_anomaly_samples(data, max_class_length = 5):
"""
Here we remove all data which has class length higher than specified value.
"""
print("\nDataset size before update:", len(data))
for i in range(len(data)):
if i < len(data) and len(data[i]['label']) > max_class_length:
print("\nAnomaly at index %d detected. Class size: %d" % (i, len(data[i]['label'])))
del data[i]
print("\nDataset after before update:", len(data))
return data
import h5py
class DigitStructsWrapper:
def __init__(self, file_, start_ = 0, end_ = 0):
self.file_ = h5py.File(file_, 'r')
self.names = self.file_['digitStruct']['name'][start_:end_] if end_ > 0 else self.file_['digitStruct']['name']
self.bboxes = self.file_['digitStruct']['bbox'][start_:end_] if end_ > 0 else self.file_['digitStruct']['bbox']
self.collectionSize = len(self.names)
print("\n%s file structure contain %d entries" % (file_, self.collectionSize))
def bboxHelper(self, keys_):
"""
Method handles the coding difference when there is exactly one bbox or an array of bbox.
"""
if (len(keys_) > 1):
val = [self.file_[keys_.value[j].item()].value[0][0] for j in range(len(keys_))]
else:
val = [keys_.value[0][0]]
return val
# getBbox returns a dict of data for the n(th) bbox.
def getBbox(self, n):
bbox = {}
bb = self.bboxes[n].item()
bbox['height'] = self.bboxHelper(self.file_[bb]["height"])
bbox['left'] = self.bboxHelper(self.file_[bb]["left"])
bbox['top'] = self.bboxHelper(self.file_[bb]["top"])
bbox['width'] = self.bboxHelper(self.file_[bb]["width"])
bbox['label'] = self.bboxHelper(self.file_[bb]["label"])
return bbox
def getName(self, n):
"""
Method returns the filename for the n(th) digitStruct. Since each letter is stored in a structure
as array of ANSII char numbers we should convert it back by calling chr function.
"""
return ''.join([chr(c[0]) for c in self.file_[self.names[n][0]].value])
def getNumberStructure(self,n):
s = self.getBbox(n)
s['name']=self.getName(n)
return s
def getAllNumbersStructure(self):
"""
Method returns an array, which contains information about every image.
This info contains: positions, labels
"""
return [self.getNumberStructure(i) for i in range(self.collectionSize)]
def getAllNumbersRestructured(self):
numbersData = self.getAllNumbersStructure()
print("\nObject structure before transforming: ", numbersData[0])
remove_anomaly_samples(numbersData)
result = []
for numData in numbersData:
metadatas = []
for i in range(len(numData['height'])):
metadata = {}
metadata['height'] = numData['height'][i]
metadata['label'] = numData['label'][i]
metadata['left'] = numData['left'][i]
metadata['top'] = numData['top'][i]
metadata['width'] = numData['width'][i]
metadatas.append(metadata)
result.append({ 'boxes':metadatas, 'name':numData["name"] })
print("\nObject structure after transforming: ", result[0])
return result
file_ = os.path.join(train_folder, 'digitStruct.mat')
dsf = DigitStructsWrapper(file_)
train_data = dsf.getAllNumbersRestructured()
file_ = os.path.join(test_folder, 'digitStruct.mat')
dsf = DigitStructsWrapper(file_)
test_data = dsf.getAllNumbersRestructured()
file_ = os.path.join(extra_folder, 'digitStruct.mat')
dsf = DigitStructsWrapper(file_, 0, 50000)
extra_data = dsf.getAllNumbersRestructured()
from PIL import Image
def print_data_stats(data, folder):
data_imgSize = np.ndarray([len(data),2])
for i in np.arange(len(data)):
filename = data[i]['name']
filepath = os.path.join(folder, filename)
data_imgSize[i, :] = Image.open(filepath).size[:]
max_w, max_h = np.amax(data_imgSize[:,0]), np.amax(data_imgSize[:,1])
min_w, min_h = np.amin(data_imgSize[:,0]), np.amin(data_imgSize[:,1])
mean_w, mean_h = np.mean(data_imgSize[:,0]), np.mean(data_imgSize[:,1])
print(folder, "max width and height:", max_w, max_h)
print(folder, "min width and height:", min_w, min_h)
print(folder, "mean width and height:", mean_w, mean_h, "\n")
max_w_i, max_h_i = np.where(data_imgSize[:,0] == max_w), np.where(data_imgSize[:,1] == max_h)
print(folder, "max width indicies:", max_w_i)
print(folder, "max height indicies:", max_h_i, "\n")
min_w_i, min_h_i = np.where(data_imgSize[:,0] == min_w), np.where(data_imgSize[:,1] == min_h)
print(folder, "min width indicies:", min_w_i)
print(folder, "min height indicies:", min_h_i, "\n***\n")
print_data_stats(train_data, train_folder)
print_data_stats(test_data, test_folder)
print_data_stats(extra_data, extra_folder)
img_size = 32
def prepare_images(samples, folder):
print("Started preparing images for convnet...")
prepared_images = np.ndarray([len(samples),img_size,img_size,1], dtype='float32')
actual_numbers = np.ones([len(samples),6], dtype=int) * 10
files = []
for i in range(len(samples)):
filename = samples[i]['name']
filepath = os.path.join(folder, filename)
image = Image.open(filepath)
boxes = samples[i]['boxes']
number_length = len(boxes)
files.append(filename)
# at 0 index we store length of a label. 3 -> 1; 123-> 3, 12543 -> 5
actual_numbers[i,0] = number_length
top = np.ndarray([number_length], dtype='float32')
left = np.ndarray([number_length], dtype='float32')
height = np.ndarray([number_length], dtype='float32')
width = np.ndarray([number_length], dtype='float32')
for j in range(number_length):
actual_numbers[i,j+1] = boxes[j]['label']
if boxes[j]['label'] == 10:
actual_numbers[i,j+1] = 0
top[j] = boxes[j]['top']
left[j] = boxes[j]['left']
height[j] = boxes[j]['height']
width[j] = boxes[j]['width']
img_min_top = np.amin(top)
img_min_left = np.amin(left)
img_height = np.amax(top) + height[np.argmax(top)] - img_min_top
img_width = np.amax(left) + width[np.argmax(left)] - img_min_left
img_left = np.floor(img_min_left - 0.1 * img_width)
img_top = np.floor(img_min_top - 0.1 * img_height)
img_right = np.amin([np.ceil(img_left + 1.2 * img_width), image.size[0]])
img_bottom = np.amin([np.ceil(img_top + 1.2 * img_height), image.size[1]])
image = image.crop((int(img_left), int(img_top), int(img_right), int(img_bottom))).resize([img_size, img_size], Image.ANTIALIAS) # Resize image to 32x32
image = np.dot(np.array(image, dtype='float32'), [[0.2989],[0.5870],[0.1140]]) # Convert image to the grayscale
mean = np.mean(image, dtype='float32')
std = np.std(image, dtype='float32', ddof=1)
if std < 0.0001:
std = 1.0
image = (image - mean) / std
prepared_images[i,:,:] = image[:,:,:]
print("Completed. Images cropped, resized and grayscaled")
return prepared_images, actual_numbers, files
train_data, train_labels, _ = prepare_images(train_data, train_folder)
print(train_data.shape, train_labels.shape)
test_data, test_labels, test_filenames = prepare_images(test_data, test_folder)
print(test_data.shape, test_labels.shape)
extra_data, extra_labels, _ = prepare_images(extra_data, extra_folder)
print(extra_data.shape, extra_data.shape)
print(extra_labels[:10])
from sklearn.utils import shuffle
# Here we add new data to our training set from extra set.
# Then we remove this part from memory to free it
train_data_temp = np.concatenate((train_data, extra_data[:40000, :, :, :]))
extra_data_temp = np.delete(extra_data, np.arange(40000), axis=0)
train_labels_temp = np.concatenate((train_labels, extra_labels[:40000]))
extra_labels_temp = np.delete(extra_labels, np.arange(40000), axis=0)
# And then we shuffle all the data we have
train_data_temp, train_labels_temp = shuffle(train_data_temp, train_labels_temp)
extra_data_temp, extra_labels_temp = shuffle(extra_data_temp, extra_labels_temp)
test_data_temp, test_labels_temp, test_filenames_temp = shuffle(test_data, test_labels, test_filenames)
print("\nTrain shapes:", train_data_temp.shape, train_labels_temp.shape)
print("Extra shapes:", extra_data_temp.shape, extra_labels_temp.shape)
print("Test shapes:", test_data_temp.shape, test_labels_temp.shape)
pickle_file = 'SVHN.pickle'
try:
f = open(pickle_file, 'wb')
save = {
'train_data': train_data_temp,
'train_labels': train_labels_temp,
'test_data': test_data_temp,
'test_labels': test_labels_temp,
'test_filenames': test_filenames_temp,
'valid_data': extra_data_temp,
'valid_labels': extra_labels_temp
}
pickle.dump(save, f, pickle.HIGHEST_PROTOCOL)
f.close()
except Exception as e:
print('Unable to save data to', pickle_file, ':', e)
raise
statinfo = os.stat(pickle_file)
print('Compressed pickle size:', statinfo.st_size)
from collections import Counter
train_num_length = Counter(train_labels_temp[:,0])
test_num_length = Counter(test_labels_temp[:,0])
extra_num_length = Counter(extra_labels_temp[:,0])
import matplotlib.pyplot as plt
plt.figure(1)
plt.subplot(221)
plt.bar(train_num_length.keys(), train_num_length.values(), align='center')
plt.xticks(train_num_length.keys())
plt.title('Train')
plt.xlabel('Labels')
plt.ylabel('Occurencies')
plt.subplot(222)
plt.bar(test_num_length.keys(), test_num_length.values(), align='center')
plt.xticks(test_num_length.keys())
plt.title('Test')
plt.xlabel('Labels')
plt.ylabel('Occurencies')
plt.subplot(223)
plt.bar(extra_num_length.keys(), extra_num_length.values(), align='center')
plt.xticks(test_num_length.keys())
plt.title('Extra')
plt.xlabel('Labels')
plt.ylabel('Occurencies')
plt.show()
pickle_file = 'SVHN.pickle'
with open(pickle_file, 'rb') as f:
save = pickle.load(f)
train_labels = save['train_labels']
test_labels = save['test_labels']
valid_labels = save['valid_labels']
del save
from collections import Counter
# Remove classes of empty labels
train_digits = Counter(train_labels.flatten()[np.where(train_labels.flatten() != 10)])
test_digits = Counter(test_labels.flatten()[np.where(test_labels.flatten() != 10)])
valid_digits = Counter(valid_labels.flatten()[np.where(valid_labels.flatten() != 10)])
import matplotlib.pyplot as plt
%matplotlib inline
f, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(20, 5))
ax1.bar(train_digits.keys(), train_digits.values(), align='center')
ax1.set_xticks(train_digits.keys())
ax1.set_title('Train')
ax1.set_xlabel('Labels')
ax1.set_ylabel('Occurencies')
ax2.bar(test_digits.keys(), test_digits.values(), align='center')
ax2.set_xticks(test_digits.keys())
ax2.set_title('Test')
ax2.set_xlabel('Labels')
ax2.set_ylabel('Occurencies')
ax3.bar(valid_digits.keys(), valid_digits.values(), align='center')
ax3.set_xticks(valid_digits.keys())
ax3.set_title('Validation')
ax3.set_xlabel('Labels')
ax3.set_ylabel('Occurencies')
plt.show()
from __future__ import print_function
import numpy as np
import tensorflow as tf
from sklearn.cross_validation import train_test_split
from six.moves import cPickle as pickle
from six.moves import range
from keras.models import Model
from keras.layers import Input, Dense, TimeDistributed
from keras.layers import LSTM
from keras.utils import np_utils
pickle_file = 'SVHN.pickle'
with open(pickle_file, 'rb') as f:
save = pickle.load(f)
train_dataset = save['train_data']
train_labels = save['train_labels']
test_dataset = save['test_data']
test_labels = save['test_labels']
test_filenames = save['test_filenames']
valid_dataset = save['valid_data']
valid_labels = save['valid_labels']
del save
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)
dataset_size =73401
def createSequences():
#dataset = np.ndarray(shape=(dataset_size, image_height, 84),dtype=np.float32)
data_labels1 = []
data_labels2 = []
data_labels3 = []
data_labels4 = []
data_labels5 = []
data_labels6 = []
i = 0
w = 0
while i < (dataset_size):
#temp = np.hstack([X_train1[i]])
#dataset[i, :, :] = temp
#temp = np.hstack([XX[i],XY[i],XZ[i]])
#dataset[i, :, :] = temp
temp_str1 = (train_labels[i][0])
temp_str2 = (train_labels[i][1])
temp_str3 = (train_labels[i][2])
temp_str4 = (train_labels[i][3])
temp_str5 = (train_labels[i][4])
temp_str6 = (train_labels[i][5])
data_labels1.append(temp_str1)
data_labels2.append(temp_str2)
data_labels3.append(temp_str3)
data_labels4.append(temp_str4)
data_labels5.append(temp_str5)
data_labels6.append(temp_str6)
w += 1
i += 1
np.array(data_labels1)
np.array(data_labels2)
np.array(data_labels3)
np.array(data_labels4)
np.array(data_labels5)
np.array(data_labels6)
return data_labels1,data_labels2,data_labels3,data_labels4,data_labels5,data_labels6
y1,y2,y3,y4,y5,y6 = createSequences()
from keras.utils import np_utils
print(len(y1))
print(y1[1])
dataset_size =13068
def createSequences():
#dataset = np.ndarray(shape=(dataset_size, image_height, 84),dtype=np.float32)
data_labels_test1 = []
data_labels_test2 = []
data_labels_test3 = []
data_labels_test4 = []
data_labels_test5 = []
data_labels_test6 = []
i = 0
w = 0
while i < (dataset_size):
#temp = np.hstack([X_train1[i]])
#dataset[i, :, :] = temp
#temp = np.hstack([XX[i],XY[i],XZ[i]])
#dataset[i, :, :] = temp
temp_str1 = (test_labels[i][0])
temp_str2 = (test_labels[i][1])
temp_str3 = (test_labels[i][2])
temp_str4 = (test_labels[i][3])
temp_str5 = (test_labels[i][4])
temp_str6 = (test_labels[i][5])
data_labels_test1.append(temp_str1)
data_labels_test2.append(temp_str2)
data_labels_test3.append(temp_str3)
data_labels_test4.append(temp_str4)
data_labels_test5.append(temp_str5)
data_labels_test6.append(temp_str6)
w += 1
i += 1
np.array(data_labels_test1)
np.array(data_labels_test2)
np.array(data_labels_test3)
np.array(data_labels_test4)
np.array(data_labels_test5)
np.array(data_labels_test6)
return data_labels_test1,data_labels_test2,data_labels_test3,data_labels_test4,data_labels_test5,data_labels_test6
y_1,y_2,y_3,y_4,y_5,y_6 = createSequences()
from keras.utils import np_utils
print(len(y_1))
print(y_1[1])
Y1 = np_utils.to_categorical(y1, 11)
Y2 = np_utils.to_categorical(y2, 11)
Y3 = np_utils.to_categorical(y3, 11)
Y4 = np_utils.to_categorical(y4, 11)
Y5 = np_utils.to_categorical(y5, 11)
Y6 = np_utils.to_categorical(y6, 11)
Y_1 = np_utils.to_categorical(y_1, 11)
Y_2 = np_utils.to_categorical(y_2, 11)
Y_3 = np_utils.to_categorical(y_3, 11)
Y_4 = np_utils.to_categorical(y_4, 11)
Y_5 = np_utils.to_categorical(y_5, 11)
Y_6 = np_utils.to_categorical(y_6, 11)
X_test1=test_dataset
X_train1=train_dataset
X_train1=train_dataset
#X_train1 /= 255
from keras.layers import Input, Dense
from keras.models import Model
from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers.convolutional import Convolution2D
from keras.layers.convolutional import MaxPooling2D
inputs = Input(shape=(32,32,1))
conv = Convolution2D(32, 5, 5, border_mode='same', input_shape=( 32,32,1), activation='relu')
conv1 = Convolution2D(64, 5, 5, border_mode='same', activation='relu')
conv2= Convolution2D(128, 5, 5, border_mode='same', activation='relu')
x=conv(inputs)
x=MaxPooling2D(pool_size=(2, 2))(x)
x=conv1(x)
x=MaxPooling2D(pool_size=(2, 2))(x)
x=conv2(x)
x=MaxPooling2D(pool_size=(2, 2))(x)
#model.add(Convolution2D(64, 5,5, border_mode='same', activation='relu'))
#model.add(MaxPooling2D(pool_size=(2, 2)))
#model.add(Convolution2D(128, 5, 5, border_mode='same', activation='relu'))
#model.add(MaxPooling2D(pool_size=(2, 2)))
x=Flatten()(x)
x = Dense(1064, activation='relu')(x)
x = Dense(800, activation='relu')(x)
x = Dense(600, activation='relu')(x)
x = Dense(400, activation='relu')(x)
x = Dense(200, activation='relu')(x)
x = Dense(164, activation='relu')(x)
x = Dense(32, activation='relu')(x)
predictions1 = Dense(11, activation='softmax')(x)
predictions2 = Dense(11, activation='softmax')(x)
predictions3 = Dense(11, activation='softmax')(x)
predictions4 = Dense(11, activation='softmax')(x)
predictions5 = Dense(11, activation='softmax')(x)
predictions6= Dense(11, activation='softmax')(x)
model = Model(input=inputs, output=[predictions1,predictions2,predictions3,predictions4,predictions5,predictions6])
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(X_train1, [Y1,Y2,Y3,Y4,Y5,Y6])
model.fit(X_train1, [Y1,Y2,Y3,Y4,Y5,Y6])
model.fit(X_train1, [Y1,Y2,Y3,Y4,Y5,Y6])
from keras.layers import Input, Dense
from keras.models import Model
from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers.convolutional import Convolution2D
from keras.layers.convolutional import MaxPooling2D
from keras.layers import Dropout
inputs = Input(shape=(32,32,1))
conv = Convolution2D(32, 5, 5, border_mode='same', input_shape=( 32,32,1), activation='relu')
conv1 = Convolution2D(64, 5, 5, border_mode='same', activation='relu')
conv2= Convolution2D(128, 5, 5, border_mode='same', activation='relu')
conv3= Convolution2D(256, 5, 5, border_mode='same', activation='relu')
x=conv(inputs)
x=MaxPooling2D(pool_size=(2, 2))(x)
x=conv1(x)
x=MaxPooling2D(pool_size=(2, 2))(x)
x=conv2(x)
x=MaxPooling2D(pool_size=(2, 2))(x)
x=conv3(x)
x=MaxPooling2D(pool_size=(2, 2))(x)
#model.add(Convolution2D(64, 5,5, border_mode='same', activation='relu'))
#model.add(MaxPooling2D(pool_size=(2, 2)))
#model.add(Convolution2D(128, 5, 5, border_mode='same', activation='relu'))
#model.add(MaxPooling2D(pool_size=(2, 2)))
x=Flatten()(x)
x=Dropout(0.2)(x)
x = Dense(1064, activation='relu')(x)
x=Dropout(0.2)(x)
x = Dense(800, activation='relu')(x)
x=Dropout(0.2)(x)
x = Dense(600, activation='relu')(x)
x=Dropout(0.2)(x)
x = Dense(400, activation='relu')(x)
x=Dropout(0.2)(x)
x = Dense(200, activation='relu')(x)
x=Dropout(0.2)(x)
x = Dense(164, activation='relu')(x)
x=Dropout(0.2)(x)
x = Dense(32, activation='relu')(x)
x=Dropout(0.2)(x)
predictions1 = Dense(11, activation='softmax')(x)
predictions2 = Dense(11, activation='softmax')(x)
predictions3 = Dense(11, activation='softmax')(x)
predictions4 = Dense(11, activation='softmax')(x)
predictions5 = Dense(11, activation='softmax')(x)
predictions6= Dense(11, activation='softmax')(x)
model = Model(input=inputs, output=[predictions1,predictions2,predictions3,predictions4,predictions5,predictions6])
model.compile(optimizer='rmsprop',loss='categorical_crossentropy',metrics=['accuracy'])
model.fit(X_train1, [Y1,Y2,Y3,Y4,Y5,Y6])
model.fit(X_train1, [Y1,Y2,Y3,Y4,Y5,Y6])
print('started............')
scores = model.evaluate(X_test1, [Y_1,Y_2,Y_3,Y_4,Y_5,Y_6], verbose=0)
print('Test loss and Test Accuracy', scores)
print('started............')
scores = model.evaluate(X_test1, [Y_1,Y_2,Y_3,Y_4,Y_5,Y_6], verbose=0)
print('Test loss and Test Accuracy', scores)
model.fit(X_train1, [Y1,Y2,Y3,Y4,Y5,Y6])
model.fit(X_train1, [Y1,Y2,Y3,Y4,Y5,Y6])
print('started............')
scores = model.evaluate(X_test1, [Y_1,Y_2,Y_3,Y_4,Y_5,Y_6], verbose=0)
print('Test loss and Test Accuracy', scores)
Describe how you set up the training and testing data for your model. How does the model perform on a realistic dataset?
The taring data and testing data was of size 32*32 and they consists of one digit to five digits. The training and
testing labels consists of six digits in array where the first digit gives number of digits present in image and the
remaining five digits gives digits in image. Blank was given by '10'
The model on X_train1 was trained for 20 epochs with batch size of 32
The model performaed well on reliastic data with nearly 89% to 100% accuracy
Accuracy of realistic data test set (X_test1) for six classifiers was 0.94804101624107884, 0.82774716863801512, 0.79025099477820493, 0.93036424856431121, 0.99051117232935415, 0.99984695439240889 respectively.
What changes did you have to make, if any, to achieve "good" results? Were there any options you explored that made the results worse?
I added third convolution layer 'Conv3' of (256,5,5) along with Max-Pooling layer of 2*2 to acheive good results
What were your initial and final results with testing on a realistic dataset? Do you believe your model is doing a good enough job at classifying numbers correctly?
The Final results are better compare to Initial results, initial results for six classifiers are dense_8_acc: 0.9404 - dense_9_acc: 0.7919 - dense_10_acc: 0.7266 - dense_11_acc: 0.7914 - dense_12_acc: 0.9488 - dense_13_acc: 0.9995
The Final results for six classifiers for six digits are dense_8_acc: 0.9501 - dense_9_acc: 0.8584 - dense_10_acc: 0.8155 - dense_11_acc: 0.8654 - dense_12_acc: 0.9581 - dense_13_acc: 0.9995
Clear improvement of accracies of first five classifiers can be seen.
The model is doing well to classify numbers correctly
Take several pictures of numbers that you find around you (at least five), and run them through your classifier on your computer to produce example results. Alternatively (optionally), you can try using OpenCV / SimpleCV / Pygame to capture live images from a webcam and run those through your classifier.
Use the code cell (or multiple code cells, if necessary) to implement the first step of your project. Once you have completed your implementation and are satisfied with the results, be sure to thoroughly answer the questions that follow.
from __future__ import print_function
from PIL import Image
from keras.datasets import mnist
from keras.models import Model
from keras.layers import Input, Dense, TimeDistributed
from keras.layers import LSTM
from keras.utils import np_utils
import numpy as np
import pandas as pd
from time import time
from IPython.display import display
from __future__ import print_function
import matplotlib.pyplot as plt
import numpy as np
import os
import sys
import tarfile
%matplotlib inline
import random
import math
from IPython.display import Image
from scipy import ndimage
from six.moves.urllib.request import urlretrieve
from six.moves import cPickle as pickle
from os import listdir
from os.path import isfile, join
import numpy
import cv2
mypath='resized/'
onlyfiles = [ f for f in listdir(mypath) if isfile(join(mypath,f)) ]
print(onlyfiles)
images = numpy.empty(len(onlyfiles), dtype=object)
for n in range(0, len(onlyfiles)):
images[n] = cv2.imread( join(mypath,onlyfiles[n]) )
import cv2
for i in range (0,5):
images[i] = cv2.cvtColor(images[i], cv2.COLOR_BGR2GRAY)
Choose five candidate images of numbers you took from around you and provide them in the report. Are there any particular qualities of the image(s) that might make classification difficult?
Answer- There are certain qualities of image that might make classification difficult
for i in range (0,5):
def createSequences():
dataset = np.ndarray(shape=(1, 32, 32),dtype=np.float32)
temp = np.hstack([images[i]])
dataset[0, :, :] = temp
return dataset
dataset = createSequences()
fig=plt.figure()
plt.imshow(dataset[0])
plt.show()
hand_dataset = dataset[0].reshape(1,32,32,1).astype('float32')
classes = model.predict(hand_dataset, batch_size=2,verbose=0)
print (classes)
Is your model able to perform equally well on captured pictures or a live camera stream when compared to testing on the realistic dataset?
The model is performing equally well on captured images when compared to realistic data. On SVHN data the accuracy was 0.94804101624107884, 0.82774716863801512, 0.79025099477820493, 0.93036424856431121, 0.99051117232935415, 0.99984695439240889. I tested the trained model on five images captured on camera, the model could identify the handwritten numbers and perfomed well
If necessary, provide documentation for how an interface was built for your model to load and classify newly-acquired images.
Answer: Leave blank if you did not complete this part.
There are many things you can do once you have the basic classifier in place. One example would be to also localize where the numbers are on the image. The SVHN dataset provides bounding boxes that you can tune to train a localizer. Train a regression loss to the coordinates of the bounding box, and then test it.
Use the code cell (or multiple code cells, if necessary) to implement the first step of your project. Once you have completed your implementation and are satisfied with the results, be sure to thoroughly answer the questions that follow.
from __future__ import print_function
import matplotlib.pyplot as plt
import numpy as np
import os
import sys
import tarfile
import tensorflow as tf
from IPython.display import Image
from scipy import ndimage
from six.moves.urllib.request import urlretrieve
from six.moves import cPickle as pickle
%matplotlib inline
url = 'http://ufldl.stanford.edu/housenumbers/'
last_percent_reported = None
def download_progress_hook(count, blockSize, totalSize):
"""
A hook to report the progress of a download. This is mostly intended for users with
slow internet connections. Reports every 1% change in download progress.
"""
global last_percent_reported
percent = int(count * blockSize * 100 / totalSize)
if last_percent_reported != percent:
if percent % 5 == 0:
sys.stdout.write("%s%%" % percent)
sys.stdout.flush()
else:
sys.stdout.write(".")
sys.stdout.flush()
last_percent_reported = percent
def maybe_download(filename, force=False):
"""
Download a file if not present, and make sure it's the right size.
"""
if force or not os.path.exists(filename):
print('Attempting to download:', filename)
filename, _ = urlretrieve(url + filename, filename, reporthook=download_progress_hook)
print('\nDownload Complete!')
else:
print(filename, 'is already downloaded. Skipped.')
return filename
train_filename = maybe_download('train.tar.gz')
test_filename = maybe_download('test.tar.gz')
extra_filename = maybe_download('extra.tar.gz')
np.random.seed(133)
def maybe_extract(file_, force=False):
filename = os.path.splitext(os.path.splitext(file_)[0])[0] # remove .tar.gz
if os.path.isdir(filename) and not force:
# You may override by setting force=True.
print('%s is already presented - Skipping extraction of %s.' % (filename, file_))
else:
print('Extracting %s file data. Please wait...' % file_)
tar = tarfile.open(file_)
sys.stdout.flush()
tar.extractall()
tar.close()
print('File %s is successfully extracted into %s directory.' % (file_, filename))
return filename
train_folder = maybe_extract(train_filename)
test_folder = maybe_extract(test_filename)
extra_folder = maybe_extract(extra_filename)
def remove_anomaly_samples(data, max_class_length = 5):
"""
Here we remove all data which has class length higher than specified value.
"""
print("\nDataset size before update:", len(data))
for i in range(len(data)):
if i < len(data) and len(data[i]['label']) > max_class_length:
print("\nAnomaly at index %d detected. Class size: %d" % (i, len(data[i]['label'])))
del data[i]
print("\nDataset after before update:", len(data))
return data
import h5py
class DigitStructsWrapper:
def __init__(self, file_, start_ = 0, end_ = 0):
self.file_ = h5py.File(file_, 'r')
self.names = self.file_['digitStruct']['name'][start_:end_] if end_ > 0 else self.file_['digitStruct']['name']
self.bboxes = self.file_['digitStruct']['bbox'][start_:end_] if end_ > 0 else self.file_['digitStruct']['bbox']
self.collectionSize = len(self.names)
print("\n%s file structure contain %d entries" % (file_, self.collectionSize))
def bboxHelper(self, keys_):
"""
Method handles the coding difference when there is exactly one bbox or an array of bbox.
"""
if (len(keys_) > 1):
val = [self.file_[keys_.value[j].item()].value[0][0] for j in range(len(keys_))]
else:
val = [keys_.value[0][0]]
return val
# getBbox returns a dict of data for the n(th) bbox.
def getBbox(self, n):
bbox = {}
bb = self.bboxes[n].item()
bbox['height'] = self.bboxHelper(self.file_[bb]["height"])
bbox['left'] = self.bboxHelper(self.file_[bb]["left"])
bbox['top'] = self.bboxHelper(self.file_[bb]["top"])
bbox['width'] = self.bboxHelper(self.file_[bb]["width"])
bbox['label'] = self.bboxHelper(self.file_[bb]["label"])
return bbox
def getName(self, n):
"""
Method returns the filename for the n(th) digitStruct. Since each letter is stored in a structure
as array of ANSII char numbers we should convert it back by calling chr function.
"""
return ''.join([chr(c[0]) for c in self.file_[self.names[n][0]].value])
def getNumberStructure(self,n):
s = self.getBbox(n)
s['name']=self.getName(n)
return s
def getAllNumbersStructure(self):
"""
Method returns an array, which contains information about every image.
This info contains: positions, labels
"""
return [self.getNumberStructure(i) for i in range(self.collectionSize)]
def getAllNumbersRestructured(self):
train_box=list()
numbersData = self.getAllNumbersStructure()
print("\nObject structure before transforming: ", numbersData[0])
remove_anomaly_samples(numbersData)
result = []
for numData in numbersData:
metadatas = []
for i in range(len(numData['height'])):
metadata = {}
metadata['height'] = numData['height'][i]
metadata['label'] = numData['label'][i]
metadata['left'] = numData['left'][i]
metadata['top'] = numData['top'][i]
metadata['width'] = numData['width'][i]
metadatas.append(metadata)
result.append({ 'boxes':metadatas, 'name':numData["name"] })
train_box.append(metadatas)
print("\nObject structure after transforming: ", result[0])
return result,train_box
file_ = os.path.join(train_folder, 'digitStruct.mat')
dsf = DigitStructsWrapper(file_)
train_data,train_box = dsf.getAllNumbersRestructured()
print(len(train_box))
print((train_box[0]))
print(train_data[0])
file_ = os.path.join(test_folder, 'digitStruct.mat')
dsf = DigitStructsWrapper(file_)
test_data,test_box = dsf.getAllNumbersRestructured()
print(len(test_box))
file_ = os.path.join(extra_folder, 'digitStruct.mat')
dsf = DigitStructsWrapper(file_, 0, 50000)
extra_data,extra_box = dsf.getAllNumbersRestructured()
from PIL import Image
def print_data_stats(data, folder):
data_imgSize = np.ndarray([len(data),2])
for i in np.arange(len(data)):
filename = data[i]['name']
filepath = os.path.join(folder, filename)
data_imgSize[i, :] = Image.open(filepath).size[:]
max_w, max_h = np.amax(data_imgSize[:,0]), np.amax(data_imgSize[:,1])
min_w, min_h = np.amin(data_imgSize[:,0]), np.amin(data_imgSize[:,1])
mean_w, mean_h = np.mean(data_imgSize[:,0]), np.mean(data_imgSize[:,1])
print(folder, "max width and height:", max_w, max_h)
print(folder, "min width and height:", min_w, min_h)
print(folder, "mean width and height:", mean_w, mean_h, "\n")
max_w_i, max_h_i = np.where(data_imgSize[:,0] == max_w), np.where(data_imgSize[:,1] == max_h)
print(folder, "max width indicies:", max_w_i)
print(folder, "max height indicies:", max_h_i, "\n")
min_w_i, min_h_i = np.where(data_imgSize[:,0] == min_w), np.where(data_imgSize[:,1] == min_h)
print(folder, "min width indicies:", min_w_i)
print(folder, "min height indicies:", min_h_i, "\n***\n")
print_data_stats(train_data, train_folder)
print_data_stats(test_data, test_folder)
print_data_stats(extra_data, extra_folder)
img_size = 32
def prepare_images(samples, folder):
print("Started preparing images for convnet...")
prepared_images = np.ndarray([len(samples),img_size,img_size,1], dtype='float32')
actual_numbers = np.ones([len(samples),6], dtype=int) * 10
files = []
for i in range(len(samples)):
filename = samples[i]['name']
filepath = os.path.join(folder, filename)
image = Image.open(filepath)
boxes = samples[i]['boxes']
number_length = len(boxes)
files.append(filename)
# at 0 index we store length of a label. 3 -> 1; 123-> 3, 12543 -> 5
actual_numbers[i,0] = number_length
top = np.ndarray([number_length], dtype='float32')
left = np.ndarray([number_length], dtype='float32')
height = np.ndarray([number_length], dtype='float32')
width = np.ndarray([number_length], dtype='float32')
for j in range(number_length):
# here we use j+1 since first entry used by label length
actual_numbers[i,j+1] = boxes[j]['label']
if boxes[j]['label'] == 10: # Replacing 10 with 0
actual_numbers[i,j+1] = 0
top[j] = boxes[j]['top']
left[j] = boxes[j]['left']
height[j] = boxes[j]['height']
width[j] = boxes[j]['width']
img_min_top = np.amin(top)
img_min_left = np.amin(left)
img_height = np.amax(top) + height[np.argmax(top)] - img_min_top
img_width = np.amax(left) + width[np.argmax(left)] - img_min_left
img_left = np.floor(img_min_left - 0.1 * img_width)
img_top = np.floor(img_min_top - 0.1 * img_height)
img_right = np.amin([np.ceil(img_left + 1.2 * img_width), image.size[0]])
img_bottom = np.amin([np.ceil(img_top + 1.2 * img_height), image.size[1]])
image = image.crop((int(img_left), int(img_top), int(img_right), int(img_bottom))).resize([img_size, img_size], Image.ANTIALIAS) # Resize image to 32x32
image = np.dot(np.array(image, dtype='float32'), [[0.2989],[0.5870],[0.1140]]) # Convert image to the grayscale
mean = np.mean(image, dtype='float32')
std = np.std(image, dtype='float32', ddof=1)
if std < 0.0001:
std = 1.0
image = (image - mean) / std
prepared_images[i,:,:] = image[:,:,:]
print("Completed. Images cropped, resized and grayscaled")
return prepared_images, actual_numbers, files
train_data, train_labels, _ = prepare_images(train_data, train_folder)
print(train_data.shape, train_labels.shape)
test_data, test_labels, test_filenames = prepare_images(test_data, test_folder)
print(test_data.shape, test_labels.shape)
extra_data, extra_labels, _ = prepare_images(extra_data, extra_folder)
print(extra_data.shape, extra_data.shape)
print(extra_labels[:10])
train_box=np.array(train_box)
test_box=np.array(test_box)
extra_box=np.array(extra_box)
print(extra_box.shape)
print(extra_labels.shape)
train_bounding_box=np.concatenate((train_box, extra_box[:40000, ]))
from sklearn.utils import shuffle
# Here we add new data to our training set from extra set.
# Then we remove this part from memory to free it
train_data_temp = np.concatenate((train_data, extra_data[:40000, :, :, :]))
extra_data_temp = np.delete(extra_data, np.arange(40000), axis=0)
train_bounding_box=np.concatenate((train_box, extra_box[:40000, ]))
train_labels_temp = np.concatenate((train_labels, extra_labels[:40000]))
extra_labels_temp = np.delete(extra_labels, np.arange(40000), axis=0)
# And then we shuffle all the data we have
train_data_temp, train_labels_temp,train_bounding_box = shuffle(train_data_temp, train_labels_temp,train_bounding_box)
extra_data_temp, extra_labels_temp = shuffle(extra_data_temp, extra_labels_temp)
test_data_temp, test_labels_temp, test_filenames_temp,test_box = shuffle(test_data, test_labels, test_filenames,test_box)
print("\nTrain shapes:", train_data_temp.shape, train_labels_temp.shape)
print("Extra shapes:", extra_data_temp.shape, extra_labels_temp.shape)
print("Test shapes:", test_data_temp.shape, test_labels_temp.shape)
print (train_bounding_box.shape)
print (test_box.shape)
pickle_file = 'SVHN.pickle'
try:
f = open(pickle_file, 'wb')
save = {
'train_data': train_data_temp,
'train_labels': train_labels_temp,
'test_data': test_data_temp,
'test_labels': test_labels_temp,
'test_filenames': test_filenames_temp,
'valid_data': extra_data_temp, # The rest of extra data will be used
'valid_labels': extra_labels_temp # as validation set during model training
}
pickle.dump(save, f, pickle.HIGHEST_PROTOCOL)
f.close()
except Exception as e:
print('Unable to save data to', pickle_file, ':', e)
raise
statinfo = os.stat(pickle_file)
print('Compressed pickle size:', statinfo.st_size)
from collections import Counter
train_num_length = Counter(train_labels_temp[:,0])
test_num_length = Counter(test_labels_temp[:,0])
extra_num_length = Counter(extra_labels_temp[:,0])
import matplotlib.pyplot as plt
plt.figure(1)
plt.subplot(221)
plt.bar(train_num_length.keys(), train_num_length.values(), align='center')
plt.xticks(train_num_length.keys())
plt.title('Train')
plt.xlabel('Labels')
plt.ylabel('Occurencies')
plt.subplot(222)
plt.bar(test_num_length.keys(), test_num_length.values(), align='center')
plt.xticks(test_num_length.keys())
plt.title('Test')
plt.xlabel('Labels')
plt.ylabel('Occurencies')
plt.subplot(223)
plt.bar(extra_num_length.keys(), extra_num_length.values(), align='center')
plt.xticks(test_num_length.keys())
plt.title('Extra')
plt.xlabel('Labels')
plt.ylabel('Occurencies')
plt.show()
pickle_file = 'SVHN.pickle'
with open(pickle_file, 'rb') as f:
save = pickle.load(f)
train_labels = save['train_labels']
test_labels = save['test_labels']
valid_labels = save['valid_labels']
del save
pickle_file = 'box.pickle'
try:
f = open(pickle_file, 'wb')
save = {
'train_box': train_bounding_box,
'test_box': test_box,
}
pickle.dump(save, f, pickle.HIGHEST_PROTOCOL)
f.close()
except Exception as e:
print('Unable to save data to', pickle_file, ':', e)
raise
statinfo = os.stat(pickle_file)
print('Compressed pickle size:', statinfo.st_size)
from __future__ import print_function
import numpy as np
import tensorflow as tf
from sklearn.cross_validation import train_test_split
from six.moves import cPickle as pickle
from six.moves import range
from keras.models import Model
from keras.layers import Input, Dense, TimeDistributed
from keras.layers import LSTM
from keras.utils import np_utils
pickle_file = 'SVHN.pickle'
with open(pickle_file, 'rb') as f:
save = pickle.load(f)
train_dataset = save['train_data']
train_labels = save['train_labels']
test_dataset = save['test_data']
test_labels = save['test_labels']
test_filenames = save['test_filenames']
valid_dataset = save['valid_data']
valid_labels = save['valid_labels']
del save
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)
pickle_file = 'box.pickle'
with open(pickle_file, 'rb') as f:
save = pickle.load(f)
train_box = save['train_box']
test_box = save['test_box']
del save
print('Training box', train_box.shape)
print('Test box', test_box.shape)
print(train_box[2])
print(test_box[1])
count=0
for key,value in train_box[1][0].iteritems():
print(value)
#print (count)
train_bound_box=list()
x0=np.empty([73401,29], dtype=int)
x1=list()
x2=list()
x3=list()
x4=list()
x5=list()
for i in range (0,73401):
count=0
for j in train_box[i]:
count=count+1
#if i<=5:
#print("count:",count)
real=0
for j in train_box[i]:
if real==0:
x0[i][0]=j['height']
x0[i][1]=j['width']
x0[i][2]=j['top']
x0[i][3]=j['left']
x0[i][4]=j['label']
#x0[i]=[j['height'],j['width'],j['top'],j['left'], j['label'],0 , 0,0 ,0 , 0, 0, 0,0 , 0, 0, 0, 0,0, 0, 0 ]
if real==1:
x0[i][5]=j['height']
x0[i][6]=j['width']
x0[i][7]=j['top']
x0[i][8]=j['left']
x0[i][9]=j['label']
#x0[i]=[0,0,0,0,0,j['height'],j['width'],j['top'],j['left'], j['label'],0 , 0,0 ,0 , 0, 0, 0,0 , 0, 0 ]
if real==2:
x0[i][10]=j['height']
x0[i][11]=j['width']
x0[i][12]=j['top']
x0[i][13]=j['left']
x0[i][14]=j['label']
if real==3:
x0[i][15]=j['height']
x0[i][16]=j['width']
x0[i][17]=j['top']
x0[i][18]=j['left']
x0[i][19]=j['label']
if real==4:
x0[i][20]=j['height']
x0[i][21]=j['width']
x0[i][22]=j['top']
x0[i][23]=j['left']
x0[i][24]=j['label']
if real==5:
x0[i][25]=j['height']
x0[i][26]=j['width']
x0[i][27]=j['top']
x0[i][28]=j['left']
x0[i][29]=j['label']
real=real+1
test_bound_box=list()
x0=np.empty([13068,29], dtype=int)
x1=list()
x2=list()
x3=list()
x4=list()
x5=list()
for i in range (0,13068):
count=0
for j in test_box[i]:
count=count+1
#if i<=5:
#print("count:",count)
real=0
for j in test_box[i]:
if real==0:
x0[i][0]=j['height']
x0[i][1]=j['width']
x0[i][2]=j['top']
x0[i][3]=j['left']
x0[i][4]=j['label']
#x0[i]=[j['height'],j['width'],j['top'],j['left'], j['label'],0 , 0,0 ,0 , 0, 0, 0,0 , 0, 0, 0, 0,0, 0, 0 ]
if real==1:
x0[i][5]=j['height']
x0[i][6]=j['width']
x0[i][7]=j['top']
x0[i][8]=j['left']
x0[i][9]=j['label']
#x0[i]=[0,0,0,0,0,j['height'],j['width'],j['top'],j['left'], j['label'],0 , 0,0 ,0 , 0, 0, 0,0 , 0, 0 ]
if real==2:
x0[i][10]=j['height']
x0[i][11]=j['width']
x0[i][12]=j['top']
x0[i][13]=j['left']
x0[i][14]=j['label']
if real==3:
x0[i][15]=j['height']
x0[i][16]=j['width']
x0[i][17]=j['top']
x0[i][18]=j['left']
x0[i][19]=j['label']
if real==4:
x0[i][20]=j['height']
x0[i][21]=j['width']
x0[i][22]=j['top']
x0[i][23]=j['left']
x0[i][24]=j['label']
if real==5:
x0[i][25]=j['height']
x0[i][26]=j['width']
x0[i][27]=j['top']
x0[i][28]=j['left']
x0[i][29]=j['label']
real=real+1
print('Done')
print (x0.shape)
print (x0[0])
pickle_file = 're_box.pickle'
try:
f = open(pickle_file, 'wb')
save = {
'train_bound_box': x0
}
pickle.dump(save, f, pickle.HIGHEST_PROTOCOL)
f.close()
except Exception as e:
print('Unable to save data to', pickle_file, ':', e)
raise
statinfo = os.stat(pickle_file)
print('Compressed pickle size:', statinfo.st_size)
pickle_file = 're_box.pickle'
with open(pickle_file, 'rb') as f:
save = pickle.load(f)
train_bound_box = save['train_bound_box']
del save
print('Training bound box', train_bound_box.shape)
from sklearn.preprocessing import MinMaxScaler
# Initialize a scaler, then apply it to the features
scaler = MinMaxScaler()
train_bound_box = scaler.fit_transform(train_bound_box)
# Show an example of a record with scaling applied
#display(train_bound_box_normal(n = 1))
X=train_bound_box[:,0:24]
X_t=x0[:,0:24]
y0=list()
y1=list()
y2=list()
y3=list()
y4=list()
y5=list()
y6=list()
y7=list()
y8=list()
y9=list()
y10=list()
y11=list()
y12=list()
y13=list()
y14=list()
y15=list()
y16=list()
y17=list()
y18=list()
y19=list()
for i in range (0,73401):
y0.append(train_bound_box[i][0])
y1.append(train_bound_box[i][1])
y2.append(train_bound_box[i][2])
y3.append(train_bound_box[i][3])
y4.append(train_bound_box[i][5])
y5.append(train_bound_box[i][6])
y6.append(train_bound_box[i][7])
y7.append(train_bound_box[i][8])
y8.append(train_bound_box[i][10])
y9.append(train_bound_box[i][11])
y10.append(train_bound_box[i][12])
y11.append(train_bound_box[i][13])
y12.append(train_bound_box[i][15])
y13.append(train_bound_box[i][16])
y14.append(train_bound_box[i][17])
y15.append(train_bound_box[i][18])
y16.append(train_bound_box[i][20])
y17.append(train_bound_box[i][21])
y18.append(train_bound_box[i][22])
y19.append(train_bound_box[i][23])
test_bound_box=np.array(test_bound_box)
print(x0.shape)
test_bound_box=x0
y_0=list()
y_1=list()
y_2=list()
y_3=list()
y_4=list()
y_5=list()
y_6=list()
y_7=list()
y_8=list()
y_9=list()
y_10=list()
y_11=list()
y_12=list()
y_13=list()
y_14=list()
y_15=list()
y_16=list()
y_17=list()
y_18=list()
y_19=list()
for i in range (0,13068):
y_0.append(test_bound_box[i][0])
y_1.append(test_bound_box[i][1])
y_2.append(test_bound_box[i][2])
y_3.append(test_bound_box[i][3])
y_4.append(test_bound_box[i][5])
y_5.append(test_bound_box[i][6])
y_6.append(test_bound_box[i][7])
y_7.append(test_bound_box[i][8])
y_8.append(test_bound_box[i][10])
y_9.append(test_bound_box[i][11])
y_10.append(test_bound_box[i][12])
y_11.append(test_bound_box[i][13])
y_12.append(test_bound_box[i][15])
y_13.append(test_bound_box[i][16])
y_14.append(test_bound_box[i][17])
y_15.append(test_bound_box[i][18])
y_16.append(test_bound_box[i][20])
y_17.append(test_bound_box[i][21])
y_18.append(test_bound_box[i][22])
y_19.append(test_bound_box[i][23])
y0=np.array(y0)
y1=np.array(y1)
y2=np.array(y2)
y3=np.array(y3)
y4=np.array(y4)
y5=np.array(y5)
y6=np.array(y6)
y7=np.array(y7)
y8=np.array(y8)
y9=np.array(y9)
y10=np.array(y10)
y11=np.array(y11)
y12=np.array(y12)
y13=np.array(y13)
y14=np.array(y14)
y15=np.array(y15)
y16=np.array(y16)
y17=np.array(y17)
y18=np.array(y18)
y19=np.array(y19)
y_0=np.array(y_0)
y_1=np.array(y_1)
y_2=np.array(y_2)
y_3=np.array(y_3)
y_4=np.array(y_4)
y_5=np.array(y_5)
y_6=np.array(y_6)
y_7=np.array(y_7)
y_8=np.array(y_8)
y_9=np.array(y_9)
y_10=np.array(y_10)
y_11=np.array(y_11)
y_12=np.array(y_12)
y_13=np.array(y_13)
y_14=np.array(y_14)
y_15=np.array(y_15)
y_16=np.array(y_16)
y_17=np.array(y_17)
y_18=np.array(y_18)
y_19=np.array(y_19)
inputs = Input(shape=(32,32,1))
conv = Convolution2D(32, 5, 5, border_mode='same', input_shape=( 32,32,1), activation='relu')
conv1 = Convolution2D(64, 5, 5, border_mode='same', activation='relu')
conv2= Convolution2D(128, 5, 5, border_mode='same', activation='relu')
conv3= Convolution2D(256, 5, 5, border_mode='same', activation='relu')
conv4= Convolution2D(512, 5, 5, border_mode='same', activation='relu')
x=conv(inputs)
x=MaxPooling2D(pool_size=(2, 2))(x)
x=conv1(x)
x=MaxPooling2D(pool_size=(2, 2))(x)
x=conv2(x)
x=MaxPooling2D(pool_size=(2, 2))(x)
x=conv3(x)
x=MaxPooling2D(pool_size=(2, 2))(x)
x=conv4(x)
x=MaxPooling2D(pool_size=(2, 2))(x)
x=Flatten()(x)
x=Dropout(0.2)(x)
x = Dense(1064, activation='relu')(x)
x=Dropout(0.2)(x)
x = Dense(800, activation='relu')(x)
x=Dropout(0.2)(x)
x = Dense(600, activation='relu')(x)
x=Dropout(0.2)(x)
x = Dense(400, activation='relu')(x)
x=Dropout(0.2)(x)
x = Dense(200, activation='relu')(x)
x=Dropout(0.2)(x)
x = Dense(164, activation='relu')(x)
x=Dropout(0.2)(x)
x = Dense(32, activation='relu')(x)
x=Dropout(0.2)(x)
predictions1 = Dense(1, init='normal')(x)
predictions2 = Dense(1,activation='sigmoid')(x)
predictions3 = Dense(1,activation='sigmoid')(x)
predictions4 = Dense(1,activation='sigmoid')(x)
predictions5 = Dense(1,activation='sigmoid')(x)
predictions6 = Dense(1,activation='sigmoid')(x)
predictions7 = Dense(1,activation='sigmoid')(x)
predictions8 = Dense(1,activation='sigmoid')(x)
predictions9 = Dense(1,activation='sigmoid')(x)
predictions10 = Dense(1,activation='sigmoid')(x)
predictions11 = Dense(1,activation='sigmoid')(x)
predictions12 = Dense(1,activation='sigmoid')(x)
predictions13 = Dense(1,activation='sigmoid')(x)
predictions14 = Dense(1,activation='sigmoid')(x)
predictions15 = Dense(1,activation='sigmoid')(x)
predictions16 = Dense(1,activation='sigmoid')(x)
predictions17 = Dense(1,activation='sigmoid')(x)
predictions18 = Dense(1,activation='sigmoid')(x)
predictions19 = Dense(1,activation='sigmoid')(x)
predictions20 = Dense(1,activation='sigmoid')(x)
model = Model(input=inputs, output=[predictions1,predictions2,predictions3,predictions4,predictions5,predictions6,predictions7,predictions8,predictions9,predictions10,predictions11,predictions12,predictions13,predictions14,predictions15,predictions16,predictions17,predictions18,predictions19,predictions20])
model.compile(optimizer='rmsprop',loss='mse')
X_train1=train_dataset
print(X_train1.shape)
print(y0[1])
model.fit(X_train1, [y0,y1,y2,y3,y4,y5,y6,y7,y8,y9,y10,y11,y12,y13,y14,y15,y16,y17,y18,y19])
model.fit(X_train1, [y0,y1,y2,y3,y4,y5,y6,y7,y8,y9,y10,y11,y12,y13,y14,y15,y16,y17,y18,y19],nb_epoch=5, batch_size=32)
# serialize model to JSON
model_json = model.to_json()
with open("model.json", "w") as json_file:
json_file.write(model_json)
# serialize weights to HDF5
model.save_weights("model.h5")
print("Saved model to disk")
from keras.models import model_from_json
# load json and create model
json_file = open('model.json', 'r')
re_model_json = json_file.read()
json_file.close()
re_model = model_from_json(re_model_json)
# load weights into new model
re_model.load_weights("model.h5")
print("Loaded model from disk")
re_model.compile(optimizer='rmsprop',loss='mse')
X_test1=np.array(X_test1)
score=re_model.evaluate(X_test1,[y_0,y_1,y_2,y_3,y_4,y_5,y_6,y_7,y_8,y_9,y_10,y_11,y_12,y_13,y_14,y_15,y_16,y_17,y_18,y_19], batch_size=32)
print(score)
X_predict=re_model.predict(X_train1)
X_predict=np.array(X_predict)
X_predict_test=re_model.predict(X_test1)
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.layers import Merge
from keras.layers import Input, Dense
from keras.models import Model
from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers.convolutional import Convolution2D
from keras.layers.convolutional import MaxPooling2D
from keras.layers import merge
from keras.layers.convolutional import Convolution1D
left_branch = Sequential()
#left_branch.add(Dense(1320, input_dim=(1024)))
left_branch.add(Convolution2D(32, 5, 5, border_mode='same', input_shape=( 32,32,1), activation='relu'))
left_branch.add(MaxPooling2D(pool_size=(2, 2)))
left_branch.add(Convolution2D(64, 5, 5, border_mode='same', input_shape=( 32,32,1), activation='relu'))
left_branch.add(MaxPooling2D(pool_size=(2, 2)))
left_branch.add(Convolution2D(128, 5, 5, border_mode='same', input_shape=( 32,32,1), activation='relu'))
left_branch.add(MaxPooling2D(pool_size=(2, 2)))
left_branch.add(Convolution2D(256, 5, 5, border_mode='same', input_shape=( 32,32,1), activation='relu'))
left_branch.add(MaxPooling2D(pool_size=(2, 2)))
left_branch.add(Flatten())
left_branch.add(Dense(1400, activation='relu'))
left_branch.add(Dropout(0.2))
left_branch.add(Dense(800, activation='relu'))
left_branch.add(Dense(400, activation='relu'))
right_branch = Sequential()
right_branch.add(Dense(2800, input_dim=24))
right_branch.add(Dense(1600, activation='relu'))
right_branch.add(Dropout(0.2))
right_branch.add(Dense(1400, activation='relu'))
right_branch.add(Dropout(0.2))
right_branch.add(Dense(400, activation='relu'))
merged = Merge([left_branch, right_branch], mode='concat')
final_model1 = Sequential()
final_model1.add(merged)
final_model1.add(Dense(3200))
final_model1.add(Dropout(0.2))
final_model1.add(Dense(2800, activation='relu'))
final_model1.add(Dense(300, activation='relu'))
final_model1.add(Dropout(0.2))
final_model1.add(Dense(140, activation='relu'))
final_model1.add(Dense(32))
final_model1.add(Dense(11, activation='softmax'))
from keras.optimizers import SGD
final_model1.compile(loss='categorical_crossentropy', optimizer=SGD(lr=0.01, momentum=0.9, nesterov=True), metrics=['accuracy'])
final_model2 = Sequential()
final_model2.add(merged)
final_model2.add(Dense(3200))
final_model2.add(Dropout(0.2))
final_model2.add(Dense(2800, activation='relu'))
final_model2.add(Dense(300, activation='relu'))
final_model2.add(Dropout(0.2))
final_model2.add(Dense(140, activation='relu'))
#final_model.add(Dropout(0.2))
final_model2.add(Dense(32))
final_model2.add(Dense(11, activation='softmax'))
final_model2.compile(loss='categorical_crossentropy', optimizer=SGD(lr=0.01, momentum=0.9, nesterov=True), metrics=['accuracy']
final_model3 = Sequential()
final_model3.add(merged)
final_model3.add(Dense(3200))
final_model3.add(Dropout(0.2))
final_model3.add(Dense(2800, activation='relu'))
final_model3.add(Dense(300, activation='relu'))
final_model3.add(Dropout(0.2))
final_model3.add(Dense(140, activation='relu'))
#final_model.add(Dropout(0.2))
final_model3.add(Dense(32))
final_model3.add(Dense(11, activation='softmax'))
final_model3.compile(loss='categorical_crossentropy', optimizer=SGD(lr=0.01, momentum=0.9, nesterov=True), metrics=['accuracy']
final_model4 = Sequential()
final_model4.add(merged)
final_model4.add(Dense(3200))
final_model4.add(Dropout(0.2))
final_model4.add(Dense(2800, activation='relu'))
final_model4.add(Dense(300, activation='relu'))
final_model4.add(Dropout(0.2))
final_model4.add(Dense(140, activation='relu'))
#final_model.add(Dropout(0.2))
final_model4.add(Dense(32))
final_model4.add(Dense(11, activation='softmax'))
final_model4.compile(loss='categorical_crossentropy', optimizer=SGD(lr=0.01, momentum=0.9, nesterov=True), metrics=['accuracy']
final_model5 = Sequential()
final_model5.add(merged)
final_model5.add(Dense(3200))
final_model5.add(Dropout(0.2))
final_model5.add(Dense(2800, activation='relu'))
final_model5.add(Dense(300, activation='relu'))
final_model5.add(Dropout(0.2))
final_model5.add(Dense(140, activation='relu'))
#final_model.add(Dropout(0.2))
final_model5.add(Dense(32))
final_model5.add(Dense(11, activation='softmax'))
final_model5.compile(loss='categorical_crossentropy', optimizer=SGD(lr=0.01, momentum=0.9, nesterov=True), metrics=['accuracy']
X_train2=X_predict
X_test2=X_predict_test
print(X_train2.shape)
print(X_train1.shape)
final_model1.fit([X_train1,X_train2], Y2, nb_epoch=5, batch_size=32)
final_model2.fit([X_train1,X_train2], Y3, nb_epoch=5, batch_size=32)
final_model3.fit([X_train1,X_train2], Y4, nb_epoch=5, batch_size=32)
final_model4.fit([X_train1,X_train2], Y5, nb_epoch=5, batch_size=32)
final_model5.fit([X_train1,X_train2], Y6, nb_epoch=5, batch_size=32)
final_model1.fit([X_train1,X_train2], Y2, nb_epoch=2, batch_size=32)
final_model2.fit([X_train1,X_train2], Y3, nb_epoch=2, batch_size=32)
final_model3.fit([X_train1,X_train2], Y4, nb_epoch=2, batch_size=32)
final_model4.fit([X_train1,X_train2], Y5, nb_epoch=2, batch_size=32)
final_model5.fit([X_train1,X_train2], Y6, nb_epoch=2, batch_size=32)
print('started............')
scores = final_model1.evaluate([X_test1,X_test2], Y_2, batch_size=32)
print('Test loss and Test Accuracy', scores)
print('started............')
scores = final_model2.evaluate([X_test1,X_test2], Y_3, batch_size=32)
print('Test loss and Test Accuracy', scores)
print('started............')
scores = final_model3.evaluate([X_test1,X_test2], Y_4, batch_size=32)
print('Test loss and Test Accuracy', scores)
print('started............')
scores = final_model4.evaluate([X_test1,X_test2], Y_5, batch_size=32)
print('Test loss and Test Accuracy', scores)
print('started............')
scores = final_model5.evaluate([X_test1,X_test2], Y_6, batch_size=32)
print('Test loss and Test Accuracy', scores)
Providing More Training to Y2 and Y3
final_model1.fit([X_train1,X_train2], Y2, nb_epoch=2, batch_size=32)
final_model2.fit([X_train1,X_train2], Y3, nb_epoch=2, batch_size=32)
scores = final_model1.evaluate([X_test1,X_test2], Y_2, batch_size=64)
print('Test loss and Test Accuracy', scores[0],scores[1])
scores = final_model2.evaluate([X_test1,X_test2], Y_3, batch_size=64)
print('Test loss and Test Accuracy', scores[0],scores[1])
from os import listdir
from os.path import isfile, join
import numpy
import cv2
mypath='images/'
onlyfiles = [ f for f in listdir(mypath) if isfile(join(mypath,f)) ]
print(onlyfiles)
images = numpy.empty(len(onlyfiles), dtype=object)
for n in range(0, len(onlyfiles)):
images[n] = cv2.imread( join(mypath,onlyfiles[n]) )
import cv2
for i in range (0,5):
images[i] = cv2.cvtColor(images[i], cv2.COLOR_BGR2GRAY)
import matplotlib.pyplot as plt
%matplotlib inline
import random
import math
from IPython.display import Image
from scipy import ndimage
for i in range (0,5):
def createSequences():
dataset = np.ndarray(shape=(1, 32, 32),dtype=np.float32)
temp = np.hstack([images[i]])
dataset[0, :, :] = temp
return dataset
dataset = createSequences()
fig=plt.figure()
plt.imshow(dataset[0])
plt.show()
hand_dataset = dataset[0].reshape(1,32,32,1).astype('float32')
X_predict=re_model.predict(hand_dataset)
classes1 = final_model1.predict([hand_dataset,X_predict], batch_size=2,verbose=0)
classes2 = final_model2.predict([hand_dataset,X_predict], batch_size=2,verbose=0)
classes3 = final_model3.predict([hand_dataset,X_predict], batch_size=2,verbose=0)
classes4 = final_model4.predict([hand_dataset,X_predict], batch_size=2,verbose=0)
classes5 = final_model5.predict([hand_dataset,X_predict], batch_size=2,verbose=0)
print(classes1)
print(classes2)
print(classes3)
print(classes4)
print(classes5)
How well does your model localize numbers on the testing set from the realistic dataset? Do your classification results change at all with localization included?
The model localize numbers well on testing set.
The classification results doesn't improve much with localization included. Before localization, the accuracies are
0.82774716863801512, 0.79025099477820493, 0.93036424856431121, 0.99051117232935415, 0.99984695439240889
and after localization the accuracies are as follows.
0.754087025038, 0.854087025038, 0.82943067033976126, 0.98867462503826142, 0.99984695439240889
The classification results are not improved
Test the localization function on the images you captured in Step 3. Does the model accurately calculate a bounding box for the numbers in the images you found? If you did not use a graphical interface, you may need to investigate the bounding boxes by hand. Provide an example of the localization created on a captured image.
I tested the localization function on captured images.
The model calculated bounding box approximately not accurately. For example, it calculated bounding box of second image
shown above as follows
height': 22.0, 'width': 26.0, 'top': 6.0, 'left': 3.0, 'label': 1.0